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This  is  a  primer  on  extracting  signals  from  noise  which  has 
one  potentially  new  idea  and  otherwise  has  as  its  goal  a  review  - 
very  brief  review  -  of  a  subject  with  hundreds  of  papers  to  introduce 
it,  explore  it,  develop  it,  and  exploit  it.  This  little  primer  is 
not  a  substitute  for  them.  (Some  are  listed  at  the  end  of  this  note, 
i.e.,  the  ones  we  found  useful.) 

The  general  problem  is  that  a  signal  detected  at  times 
kAt  ,  call  it  D(k) ,  is  composed  of  the  signal  s(k)  one  wishes  to 
observe  and  unwanted  additive  noise,  n(k) ,  one  would  like  to 
remove.  To  accomplish  this  we  want  to  feed  into  D(k)  a  signal  y(k) 
with  characteristics  which  isolate  s(k)  from  n(k)  according  to  some 
performance  criterion.  A  typical  performance  criterion  is  that 

N 

P  =  l  (D(k)  -  y(k)  )2  (1) 

k=»l 

be  a  minimum.  N  is  the  total  number  of  measurements  of  D. 

Two  cases  arise  naturally,  one  in  which  y(k)  is  driven 
toward  n(k)  as  P  is  minimized  (this  is  called  case  R)  and  a  second  in 
which  y(k)  Is  driven  to  s(k)  (case  B): 

(l)  Case_R.  In  case  R  we  have  a  reference  signal  coming 
directly  from  the  noise  source  itself.  An  important  example  of  this 


would  be  "own  ship"  noise  generated  by  machinery  on  a  surface  ship  or 
submarine  which  propagated  from  the  turbines  or  whatever  through 
elastic  or  waterborne  paths  to  contaminate  a  signal,  s(k),  which  an 
onboard  or  towed  sonar  array  is  trying  to  hear.  In  case  R  we  choose 
for  y(k)  a  linear  combination  of  our  knowledge  of  the  reference  noise 
r(k-l),  r(k-2),  .  .  .  r(k-L) 

L 

y(k)  -  l  W(a)  r(k-a)  ,  (2) 

a»  1 

where  r(k)  is  some  linear  combination  of  the  noise  samples  them¬ 
selves.  In  case  R  we  want  to  choose  the  weights  W(a)  so  P  is 
minimum,  i.e. 

ap  N 

-  -2  l  (D<k)  -  y(k) )  r(k  -  a)  -  0;  a-l,...L  (3) 

|  If  we  succeed  In  choosing  the  W's  to  minimize  P  in  case  R, 

then  y  has  become  our  noise  estimator  chosen  to  minimize  the 
correlation  between  D  and  y.  The  part  of  D  correlated  with  y  is  the 
I  noise  (since  we  may  assume  the  signal  s(k)  and  the  noise  n(k)  are 

uncorrelated),  so  D(k)  -  y(k)  is  our  estimate  of  the  signal  alone. 

|  (2)  Case  B.  In  this  case  we  have  a  signal  which  has  a 


correlation  function  in  time 


(4) 


C  (4)  -  jj  l  s(k  +  l)  s(k) 
k  -  1 

which  falls  off  very  slowly  in  time  compared  to  the  correlation 
function  of  the  noise 


C  (4)  -  -j  l  n(k  +  l)  n(k)  .  (5) 

n  N  k  -  1 

-4/A  -4/A 

If  the  envelope  of  C  or  C  behave  as  e  8  or  e  n  respectively, 

s  n 

case  B  is  A  »  A  .  Stated  in  other  words,  we  have  broadband  (in 
s  n 

frequency)  noise. 


In  case  B  we  want  to  choose  as  our  reference  signal,  r,  the 

detected  signal  lagged  by  a  time  A  which  satisfies 

A  «  A  «  A  .  Then  r(k  -  A)  -  s(k  -  A)  +  n(k  -  A)  is  correlated 
n  s 

with  s  but  uncorrelated  with  n.  If  we  choose  the  weights  W(a), 
a»l , . . .M  so 


M 

y(k)  -  l  W(a)  r(k  -  a  -  A) 
a  *  1 


(6) 


and  minimize  P  ,  we  are  in  effect  minimizing  any  correlation  between 
D  and  y.  Since  the  part  of  D  still  correlated,  after  lag  A,  with  y 
is  the  signal  s,  It  follows  that  y  itself  is  our  estimate  of  the 


Case  R  and  B  are  indicated  in  Figure  A. 


The  solution  to  the  optimum  filtering  problem  is  that  set 
of  WQ(a),  a  -  1....M  satisfying  (3): 


M 

CDr(a)  "  l  Vb)  CD(b,a)  • 
b  *  1 


(7) 


where 


CDr<a>  "  i 


N 

l  D(k)  r(k  -  a) 
k  -  1 


(8) 


and 


C  (b,a)  -  ^  l  r(k  -  a)  r(k  -  b)  (9) 

k  =  1 

are  the  obvious  correlation  functions.  If  the  processes  involved  are 
stationary  in  time,  then  CQ  is  a  function  only  of  |a  -  b|. 

Clearly  at  the  optimum 
M  1 

WQ(b)  -  l  Cr1(b,a)  CDr(a)  ,  (10) 

a  -  1 

which  1 s  the  classical  Wiener  filtering  solution.  To  use  this 
solution  one  needs  to  accurately  compute  the  correlation  functions 


M 

i  W(a)  r  (k-a)  =  y(k) 
«  =  1 


r(k-l),  r(k-2),... 


W 


a)  Reference  Signal  from  Probe  on  Noise  Source 

b)  Detected  Signal  D(k)  Delayed  by  A 

R)  Case  R  -  Reference  Signal  is  Filtered  Noise. 

B)  Case  B  -  Reference  Signal  is  Time  Delay  of  Detected  Signals. 


Figure  A. 


Feed  Back  to 
Correct  Weights 


-5- 


C  and  for  as  many  lags  as  filter  weights  and  then  to  do  a  M  x  M 
matrix  inversion  followed  by  a  multiplication.  All  this  is  costly  in 
computing  time.  Furthermore,  if  the  nature  of  the  signal  or  the 
noise  changes  during  the  observation  time,  the  correlation  functions 
remember  too  much  about  the  past  signal  -  so  (10)  is  not  likely  to  be 
very  useful  for  time  varying  situations;  it  doesn't  learn  enough. 


The,  by  now  almost  traditional,  approach  is  to  correct  the 
weights  at  each  time  step  by  adding  an  amount  proportional  to  the 


instantaneous  slope  3y(a~)  i*1  weight  space.  If  we  give  a  time  label 
to  the  weights  so  W^(a)  is  the  atk  weight  at  time  step  k,  then  the 
so-calle^.  IMS  adaptive  algorithm  chooses  W^  +  ^(a)  as 


Wk  +  1^  =  Wk^  +  "  y<k>)  r(k  ~  a)  •  OO 


The  parameter  p  is  up  to  the  user.  When  the  sequence  of 
W^(a)  converges,  one  can  expect  it  to  find  W^(a)  given  by  (10). 
Much  less  computation  is  involved. 

We’ll  give  an  example  of  the  use  of  (11)  in  a  few  para¬ 
graphs.  Let  us  now  comment  that  the  use  of  the  LMS  prescription  is 
numerically  tricky.  The  time  (in  number  of  "cycles”  of  the  basic 
signal)  to  learn  a  new  signal  and  choose  weights  accordingly  is 
proportional  to  p  * ;  this  encourages  the  user  to  choose  p 
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large.  The  convergence  properties  of  the  sequence,  however,  rapidly 
go  bad  as  p  becomes  too  large.  Unfortunately,  this  may  happen  at 
very  small  p  .  Although  some  of  the  references  give  some  bounds  on 
p  for  convergence,  in  practice  it  is  somewhat  cut  and  try  and 
depends  in  a  sensitive  way  on  the  scale  of  the  noise.  If  the  noise — 
or  r(k) — in  (11)  is  large,  p  must  be  correspondingly  decreased. 

My  colleague  A.  Peterson  in  JASON  indicated  to  me  that  the 
hardware  versions  of  the  LMS  algorithm  set  p  quite  large  to  reduce 
the  learning  time  and  then  monitor  the  size  of  the  W(a),  a  =  1 
to  check  for  instabilities.  If  tne  W(a)  get  outside  an  accej._able 
bound,  they  are  reinitialized. 

The  second  method  of  adaptive  cancellation  we  suggest  is  the 
one  original  piece  of  this  note.  The  information  in  the  correlation 
functions  and  C  in  (8)  and  (9)  is  contained  in  their 

correlation  lengths.  This  is  the  number  of  time  steps  in  which 
CDr(H)  ,  for  example,  falls  by  1/e  of  its  value  at  zero  lag.  So 

CDr()l)  ~  e  */L  (12) 

and  L  is  the  correlation  length. 


Our  Instantaneous  Correlation  (IC)  algorithm  does  the 


following:  At  time  step  k  compute 


and 


c  <*> 


2L  +  1  . 
1  J 


I  D(j)  r(j  -  a)  , 


(13) 


=  k  -  2L, 


C^k)(b,a) 


2L„+  1  . 
2  J 


l  r ( j  -  a)  r( j  -  b)  ,  (14) 


=  k  -  2L„ 


whi-re  and  are  correlation  lengths  to  be  adjusted. 


Next  form 


M  -1 

W(k)(a)  =  l  C<k)  (a ,b)  C<k)(b)  ,  (15) 

b  =*  1 

which  are  the  filter  weights  entering  our  determination  of  the 
signal. 


One  must  learn  by  experimentation  which  L  and  are 
appropriate.  Since  the  correlation  length  of  the  noise  is  raucn  less 
than  that  of  the  signal,  we  can  expect 

The  IC  method  may  require  a  lot  of  computing,  but  it  clearly 
learns  very  well.  As  the  signal  or  the  noise  changes,  so  will  the 
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Figure  9 
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the  adaptive  filter  through  more  cycles  of  the  signal  to  let  the 
filter  learn  better.  We  had  to  choose  p  small  and  therefore 
adaptation  times  long  to  insure  stability  of  the  algorithm.  This  is 
the  fundamental  snag  of  the  LMS  technique.  One  should  be  aware  of 
it,  not  detered  by  it. 

(2)  We  could  have  taken  a  number  of  samples  of  noisy 
signals,  filtered  them,  and  then  averaged  the  power  spectra.  This 
will  quite  likely  improve  matters,  and  is  very  straightforward  we 
simply  didn't  have  time. 

Furthermore  and  finally,  time  prevented  us  from  exploring 
the  IC  algorithm  in  any  detail.  It  should  be  easy  to  do  -  we  did  it 
running  in  a  few  hours,  but  hadn't  enough  experience  by  this  writing 
to  "learn"  from  it. 
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Next  we  corrupted  the  signal  with  gaussian  random  noise  with 


zero  mean  and  variance  unity.  The  noisy  signal  is  shown  in 
Figure  5.  Using  the  IMS  algorithm  we  then  corrected  the  noisy  signal 
using  16  weights,  an  IMS  parameter  y  =  0.01  (chosen  by  trial  and 

error  to  be  small  enough  to  be  stable),  and  a  lag  =  A  =  3.  The 

result  of  the  calculation  shown  in  Figure  6  shows  the  initial 
learning  and  then  the  corrected  signal.  In  Figure  7  we  suppressed 
the  learning  period  and  exhibit  only  the  corrected  signal.  This  is 
to  be  directly  compared  to  the  noisy  signal  in  Figure  5.  Our  final 
time  series  display,  Figure  8,  is  identical  to  Figure  7  but  with  the 
lag  =  A  =  5  which  does  not  produce  a  visible  improvement. 

Next  we  show  three  power  spectra.  Figure  9  is  the  power 
spectrum  of  Figure  7.  It  should  be  compared  with  the  spectrum  in 
Figure  2  of  the  clean  signal.  A  few  lines  are  still  visible.  In  the 

last  two  figures  (10  and  11)  we  show  the  power  spectra  of  the  even 

more  noisy  signal  (signal  variance  ■  3  times  larger)  first  without 
correction,  then  IMS  corrected  with  y  *  0.008,  16  weights,  and  a 

lag  of  25. 

What  conclusions  shall  we  draw  from  this.  Clearly,  to  the 
eye,  the  IMS  algorithm  has  cleaned  up  the  time  series  by  reducing  the 
variance  of  the  noise.  Nevertheless,  the  visible  improvement  in  the 
power  spectra  is  not  much.  Several  things  should  be  tried:  (1)  run 
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might  not.  Using  one  of  the  uttering  algorithms  (LMS  or  IC  or 
whatever)  guarantees  that  the  appropriate  correlation  is  driven  as 
small  as  possible,  thus  providing  one  with  an  objective  goal  which 
will  serve  even  when  box  averaging  fails.  Perhaps  the  way  to  look  at 
the  weighted  average  is  that  it  knows  a  little  bit  more  about  the 
noise  than  its  average.  It  also  knows  that  it  must  kill  its 
correlations  with  the  signal  and  that  it  must  reduce  the  variance  due 
to  the  noise  which  creeps,  unwanted,  into  the  estimated  signal. 


A  LITTLE  EXPERIMENT 

To  examine  the  ease  with  which  one  can  use  these  ideas  we 
created  a  signal  from  the  solution  of  a  set  of  nonlinear  differential 
equations  and  then,  by  hand,  added  guassian  random  noise  to  it. 

Using  the  LMS  algorithm  we  then  tried  to  extract  the  signal  back  out 
of  noise. 


The  signal  is  shown  in  Figure  1.  It  is  periodic  but  not 
sinusoidal.  Its  power  spectrum  is  given  in  Figure  2.  Clearly  it  has 
many  harmonics.  To  see  that  we  can  apply  Case  B  to  this  signal  we 
evaluated  its  autocorrelation  function  (unnormalized)  and  show  it  in 
Figure  3.  On  the  same  horizontal  scale  we  show  our  noise  auto¬ 
correlation  function  in  Figure  4.  An  expanded  view  of  Figure  4 

3 

indicates  that  A  "2  while  from  Figure  3  we  see  A  *  10  . 

n  s 

Clearly  Case  B  applies. 
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(k)  (k) 

C  and  the  W  ,  and  within  a  correlation  time  of  the  changed 
signal.  Further  the  method  is  intermediate  between  the  LMS  algorithm 
and  the  full  Wiener  Wq  in  amount  of  computation  required.  It  is 
stable,  while  the  IMS  approach  may  not  be. 

A  final  remark:  in  case  B  the  signal  estimate  is 

M 

y(k)  *  £  W(a)  [s(k  -  A  -  a)  +  n(k  -  A  -  a) ]  .  (16) 

a  »  1 

One  might  have  thought  to  "smooth"  the  detected  signal  by 
extracting  the  broadband  noise  through  an  averaging  procedure  over 
the  "recent”  past.  Namely,  one  might  take  as  the  the  estimate  for 
the  signal 

1  K 

s  (k)  “IF  I  [8<k  “  J)  +  n(k  ~  J)  ]•  (17) 

6  K  j  «  1 

This  kind  of  "box"  averaging  will  eliminate  high  frequency  noise  and 
is  really  a  choice  of  uniform  weights  W(a)  ■  1/K  in  our  optimiza¬ 
tion  scheme.  The  procedure  outlined  chooses  W(a)  according  to  a 
prescription  which  filters  out  noise  at  different  rates  at  different 
frequencies  to  smooth  the  detected  signal  in  an  optimum  way  over  the 
whole  spectrum  without  a^  priori  bias  as  to  the  relative  magnitude  of 
the  W(a).  As  given,  the  prescription  (17)  might  well  work  -  and  it 
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